Description of Fusariose Phenotypes

1/ Introduction

This file is a supplementary data attached with the publication. It aims to describe the phenotypes collected to study the resistance of durum wheat to fusariose.

Let’s upload the file containing all the phenotypes

#Watch out, to reproduct analysis, you have to update the path.
data=read.table("~/Dropbox/Publi_Fusariose/ANALYSIS_REPRO/DATA/PHENOTYPE/phenotypage_all_fusa.csv" , header=T , sep=";" )
#numeric
data[,-1]=apply(data[,-1],2,as.numeric)
#check no redondancy in genotype name
table(data$geno)[table(data$geno)>1]
## named integer(0)
# put geno name as rowname
rownames(data)=data$geno
data=data[,-1]
# delete columns with only NA
which(apply( data , 2 , function(x) all(is.na(x)) )==TRUE)
##  GRI11_HEI_rep2 GRI11_NANT_rep2  LEC14_HEI_rep2 
##              64              78             210
data=data[ , ! apply( data , 2 , function(x) all(is.na(x)) ) ]

How many different phenotypes do we have?

ncol(data)-1
## [1] 311

How many different genotypes do we have? (counting dic2 and silur)

nrow(data)
## [1] 199

Charge some libraries that will be useful

library(RColorBrewer)
library(xtable)
library(plotly)
library(FactoMineR)

Let’s create a color vector for the study

my_colors = brewer.pal(8, "Set2") 
my_colors = colorRampPalette(my_colors)(15)

Let’s create some objects denoting groups of genotype:

# reps
REP1=grep("rep1", colnames(data) )
REP2=grep("rep2", colnames(data) )
BLUP=grep("blup", colnames(data) )

# experiments
CAP13=grep("CAP13", colnames(data) )
GRI11=grep("GRI11", colnames(data) )
GRI13=grep("GRI13", colnames(data) )
GRI15=grep("GRI15", colnames(data) )
LEC14=grep("LEC14", colnames(data) )
BEQ11=grep("BEQ11", colnames(data) )
BAR14=grep("BAR14", colnames(data) )

# Type de phéno NEPI, NEPIL, PEPIL, NOT, DON
PEPIL=grep("PEPIL", colnames(data) )
PEPI=grep("PEPI", colnames(data))
PEPI=setdiff(PEPI, PEPIL)
NEPIL=grep("NEPIL", colnames(data) )
NOT=grep("NOT", colnames(data) )
DON=grep("DON", colnames(data) )

# AUDPC et derniere notation
AUDPC=grep("AUDPC", colnames(data) )
LAST=AUDPC-1

# agronomical variable
EPI=grep("_EPI" , colnames(data) )
FLO=grep("FLO" , colnames(data) )
HEI=grep("HEI" , colnames(data) )

Let’s check if we have every variable we expect for publication in this table! –> everything is all right!

# CAP13 --> must be 13
a=intersect(CAP13,BLUP) ; a=intersect(a,c(DON, NOT, PEPI, NEPIL, PEPIL)) ; length(a)
## [1] 13
# GRI11 --> must be 12
a=intersect(GRI11,BLUP) ; a=intersect(a,c(DON, NOT, PEPI, NEPIL, PEPIL)) ; length(a)
## [1] 12
# GRI13 --> must be 17
a=intersect(GRI13,BLUP) ; a=intersect(a,c(DON, NOT, PEPI, NEPIL, PEPIL)) ; length(a)
## [1] 17
# GRI15 --> must be 12
a=intersect(GRI15,BLUP) ; a=intersect(a,c(DON, NOT, PEPI, NEPIL, PEPIL)) ; length(a)
## [1] 12
# LEC14 --> must be 12 + DON qui n'a pas de blup=13
a=intersect(LEC14,BLUP) ; a=intersect(a,c(DON, NOT, PEPI, NEPIL, PEPIL)) ; length(a)
## [1] 12
# BEQ11 --> must be 13
a=intersect(BEQ11,c(DON, NOT, PEPI, NEPIL, PEPIL)) ; length(a)
## [1] 13
# BAR14 --> must be 17 / one is useless (NEPIL_blup)
a=intersect(BAR14,BLUP) ; a=intersect(a,c(DON, NOT, PEPI, NEPIL, PEPIL)) ; length(a)
## [1] 17

2/ Experiments description

Locations and year

There are 3 years concerned by this study: 2011, 2013, and 2015. There are 5 location concerned by this study: Montbartier, Lectoure, Cappelle, Grisolles and Monbequi. Finally, there 7 experiments that will be useful in this study. Let’s summarize it in a table.

A=matrix("-",5,4)
rownames(A)=c("Cappelle","Grisolles","Lectoure","Monbequi","Monbartier")
colnames(A)=c("2011", "2013", "2014","2015")
v2011=c("-","Yes","-","Yes","-")
v2013=c("Yes","Yes","-","-","-")
v2014=c("-","-","Yes","-","Yes")
v2015=c("-","Yes","-","-","-")
A[,c(1:4)]=c(v2011,v2013,v2014,v2015)
print(xtable(A), type = "html", include.rownames = T , comment=FALSE)
2011 2013 2014 2015
Cappelle
Yes
Grisolles Yes Yes
Yes
Lectoure
Yes
Monbequi Yes
Monbartier
Yes

Les pressions fongiques sont différentes d’une parcelle à l’autre: - faible: Montbartier 2014, Montbequi - Moyen: Grisolles - Fort : Cappelle

Phenotyping methods

2 types de résistance: - type 1: capacité à toucher un épi (on compte le nbr d’épi touchés) - type2: capacité à se propager dans un épi (on compte le nombre d’épillets touchés)

We have several phenotyping methods used in this study: +Fusariose: ++ resistance - note fusa? - number of spike with fusariose symptom - % of spike with fusariose symptpom - extrusion des antheres=sinon, le champi reste et se dvlp.

++ toxinogénèse=quantité de toxine que le champignon laisse sur le grain - DON=quantité de toxine, mesuré par dosage biochimique. (ultra chèr). Pas sur tous les essais du coup.

+Agronomical features: - flowering data - height - Long_col_epi? - Compacité épi –pourrait etre favorable au déplacement du champi - verse: a mark going from 0 to 8

  • Metabolome: va intervenir en ralentissant la dispersion du champi, soit en limitant la quntité de toxine en surface (DON). On a pris des plantes infectées et des saines. Prélevement de graines immatures. On a pris 40 composés phénoliques potentiellement intéressant. Et on compare au cours du temps entre sain et sensibles pour voir si il y a une différence de concentration de ces composés. Ces 40 composés sont constitutifs: rien a voir avec la présence de champignon. Par contre ces composés phénol évoluent au fil du temp + sont différents entre résistant et sensibles.

See the “table” folder in the dropbox for an extensive description of available phenotypes for each trial and each phenotyping date.

3/ Descriptive analysis?

NOTE

Let’s plot the histogram of every rep1 of the NOTE.
Pas de note pour CAP13.
On observe bien l’augmentation de la maladie dans le temps.
- GRI11 > GRI13 > GRI15 en terme de quantité de symptome.
- BEQ11 Très touché

par(mfrow=c(5,4), mar=c(4,3,1,1))
for(i in c( intersect(NOT, REP1), intersect(NOT, BEQ11) )   ){
  hist( data[,i]  ,  border=F, xlab="" , col=my_colors[6], main="", las=1)
  abline( v=c( data[rownames(data)=="dic2", i], data[rownames(data)=="silur", i]), col=c("blue","red") , lwd=2)
  a=gsub("_rep1", "", colnames(data)[i])
  a=paste(a, " | ", round(mean(data[,i],na.rm=T),2), sep="")
  title( a , col.main="blue" , cex.main=0.7)
  }

PEPI

Pas de PEPI pour GRI11 seulement. On a bien qqchose entre 0 et 100. Observation:
- Not a normal distribution –> importance des blups. - CAP13 / GRI13: for taux d’épi touchés. On constate une net évolution au cours du temps : hist par vers la droite - LEC14 / BAR 14: faible taux d’épi touché, moins de contamination? - Répartition plus uniforme pour les AUDPC

par(mfrow=c(6,4), mar=c(2,3,1,1))
for(i in c( intersect(PEPI, REP1), intersect(PEPI, BEQ11) )  ){
  hist( data[,i]  ,  border=F, xlab="" , col=my_colors[3], main="", las=1)
  abline( v=c( data[rownames(data)=="dic2", i], data[rownames(data)=="silur", i]), col=c("blue","red") , lwd=2)
  a=gsub("_rep1", "", colnames(data)[i])
  a=paste(a, " | ", round(mean(data[,i],na.rm=T),2), sep="")
  title( a, col.main="blue" , cex.main=0.7)
  }

NEPIL

Dispo pour toutes les parcelles.
Observation:

par(mfrow=c(7,5), mar=c(4,3,1,1))
for(i in c( intersect(NEPIL, REP1), intersect(NEPIL, BEQ11) ) ){
  hist( data[,i]  ,  border=F, xlab="" , col=my_colors[4], main="", las=1)
  abline( v=c( data[rownames(data)=="dic2", i], data[rownames(data)=="silur", i]), col=c("blue","red") , lwd=2)
  a=gsub("_rep1", "", colnames(data)[i])
  a=paste(a, " | ", round(mean(data[,i],na.rm=T),2), sep="")
  title( a, col.main="blue" , cex.main=0.7)
  }

PEPIL

par(mfrow=c(5,4), mar=c(4,3,1,1))
for(i in c( intersect(PEPIL, REP1), intersect(PEPIL, BEQ11) )   ){
  hist( data[,i]  ,  border=F, xlab="" , col=my_colors[5], main="", las=1)
  abline( v=c( data[rownames(data)=="dic2", i], data[rownames(data)=="silur", i]), col=c("blue","red") , lwd=2)
  a=gsub("_rep1", "", colnames(data)[i])
  a=paste(a, " | ", round(mean(data[,i],na.rm=T),2), sep="")
  title( a, col.main="blue" , cex.main=0.7)
  }

BLUP

Let’s plot the histogram of every BLUPs of the NOTE.
On doit voir Dic2 et Silur très très proche du coup, plus de diff entre les 2 !

par(mfrow=c(4,4), mar=c(4,3,1,1))
for(i in c( intersect(NOT, BLUP), intersect(NOT, BEQ11) )   ){
  hist( data[,i]  ,  border=F, xlab="" , col=my_colors[6], main="", las=1)
  abline( v=c( data[rownames(data)=="dic2", i], data[rownames(data)=="silur", i]), col=c("blue","red") , lwd=2)
  a=gsub("_rep1", "", colnames(data)[i])
  a=paste(a, " | ", round(mean(data[,i],na.rm=T),2), sep="")
  title( a , col.main="blue" , cex.main=0.7)
  }

4/ PCA

Par parcelle

On va garder la rep1 seulement à chaque fois, et faire une ACP pour chaque expé:

plot_my_PCA=function( exp){
  a=intersect(exp, c(FLO, HEI, PEPI, NEPIL, PEPIL, NOT, DON))
  a=intersect(a, REP1)
  res.PCA = PCA(data[ , a ] , scale.unit=TRUE, ncp=3, graph=F) 
  plot.PCA(res.PCA, axes=c(1, 2), choix="var", cex=0.7, title="" )
}
# CAP13
plot_my_PCA(CAP13)
## Warning in PCA(data[, a], scale.unit = TRUE, ncp = 3, graph = F): Missing
## values are imputed by the mean of the variable: you should use the
## imputePCA function of the missMDA package

# GRI11
plot_my_PCA(GRI11)
## Warning in PCA(data[, a], scale.unit = TRUE, ncp = 3, graph = F): Missing
## values are imputed by the mean of the variable: you should use the
## imputePCA function of the missMDA package

# GRI13
plot_my_PCA(GRI13)
## Warning in PCA(data[, a], scale.unit = TRUE, ncp = 3, graph = F): Missing
## values are imputed by the mean of the variable: you should use the
## imputePCA function of the missMDA package

# GRI15
plot_my_PCA(GRI15)
## Warning in PCA(data[, a], scale.unit = TRUE, ncp = 3, graph = F): Missing
## values are imputed by the mean of the variable: you should use the
## imputePCA function of the missMDA package

# LEC14
plot_my_PCA(LEC14)
## Warning in PCA(data[, a], scale.unit = TRUE, ncp = 3, graph = F): Missing
## values are imputed by the mean of the variable: you should use the
## imputePCA function of the missMDA package

# BEQ11
a=intersect(BEQ11, c(FLO, HEI, PEPI, NEPIL, PEPIL, NOT, DON))
res.PCA = PCA(data[ , a ] , scale.unit=TRUE, ncp=3, graph=F) 
## Warning in PCA(data[, a], scale.unit = TRUE, ncp = 3, graph = F): Missing
## values are imputed by the mean of the variable: you should use the
## imputePCA function of the missMDA package
plot.PCA(res.PCA, axes=c(1, 2), choix="var", cex=0.7, title="" )

# BAR14
plot_my_PCA(BAR14)
## Warning in PCA(data[, a], scale.unit = TRUE, ncp = 3, graph = F): Missing
## values are imputed by the mean of the variable: you should use the
## imputePCA function of the missMDA package

Par trait

Pour voir si des expé ont tendance à se ressembler et si les individus résistants sont stables d’une parcelle à l’autre.

# Function for the PCA
replot_my_PCA=function(trait){
  a=c( intersect(trait, REP1), intersect(trait, BEQ11)) 
  b=as.numeric( as.factor( gsub("_.*","", colnames(data)[a]) ))
  res.PCA = PCA(data[ , a ] , scale.unit=TRUE, ncp=3, graph=F) 
  plot.PCA(res.PCA, axes=c(1, 2), choix="var", cex=0.7, title="" , col.var = my_colors[b])
}


# Function to calculate correlation
calculate_cor=function(trait){
  a=c( intersect(trait, REP1), intersect(trait, BEQ11)) 
  a=intersect(a, AUDPC)
  out=round(cor(data[,a] , use="complete.obs"),2)
  colnames(out)=gsub("_.*","",colnames(out))
  rownames(out)=colnames(out)
  diag(out)="-"
  out[lower.tri(out)]="-"
  return(out)
}

Apply it

## Warning in PCA(data[, a], scale.unit = TRUE, ncp = 3, graph = F): Missing
## values are imputed by the mean of the variable: you should use the
## imputePCA function of the missMDA package
GRI11 GRI13 GRI15 BEQ11
GRI11
0.35 0.12 0.3
GRI13
0.25 0.28
GRI15
0.24
BEQ11
## Warning in PCA(data[, a], scale.unit = TRUE, ncp = 3, graph = F): Missing
## values are imputed by the mean of the variable: you should use the
## imputePCA function of the missMDA package
CAP13 GRI13 GRI15 LEC14 BAR14 BEQ11
CAP13
0.3 0.23 0.33 0.2 0.05
GRI13
0.35 0.31 0.34 0.22
GRI15
0.05 0.33 0.14
LEC14
0.12 0.22
BAR14
0.26
BEQ11
## Warning in PCA(data[, a], scale.unit = TRUE, ncp = 3, graph = F): Missing
## values are imputed by the mean of the variable: you should use the
## imputePCA function of the missMDA package
CAP13 GRI11 GRI13 GRI15 LEC14 BAR14 BEQ11
CAP13
0.12 0.35 0.25 0.38 0.24 0.03
GRI11
0.33 0.11 0.25 0.21 0.34
GRI13
0.23 0.32 0.45 0.36
GRI15
0.05 0.27 0.13
LEC14
0.05 0.06
BAR14
0.38
BEQ11
## Warning in PCA(data[, a], scale.unit = TRUE, ncp = 3, graph = F): Missing
## values are imputed by the mean of the variable: you should use the
## imputePCA function of the missMDA package
CAP13 GRI13 GRI15 LEC14 BAR14 BEQ11
CAP13
0.3 0.23 0.33 0.2 0.05
GRI13
0.35 0.31 0.34 0.22
GRI15
0.05 0.33 0.14
LEC14
0.12 0.22
BAR14
0.26
BEQ11

Agronomical features

Let’s plot a PCA to see the relationship between variables height, precocity and flowering time. There is a strong relationship between FLO and EPI. These 2 variables are not correlated with height. These 3 variable are very stable among experiments.

res.PCA = PCA(data[ , c(EPI, FLO, HEI) ] , scale.unit=TRUE, ncp=3, graph=F) 
## Warning in PCA(data[, c(EPI, FLO, HEI)], scale.unit = TRUE, ncp = 3, graph
## = F): Missing values are imputed by the mean of the variable: you should
## use the imputePCA function of the missMDA package
ze_col=c( rep(my_colors[1], length(EPI)), rep(my_colors[2], length(FLO)), rep(my_colors[3], length(HEI)) )
plot.PCA(res.PCA, axes=c(1, 2), choix="var", cex=0.7, col.var=ze_col, title="" )

Important Blups

ACP avec les Blups des AUDPC et des dernières notations. Couleur=parcelle.

a=intersect(BLUP,c(AUDPC,LAST))
b=as.numeric( as.factor( gsub("_.*","", colnames(data)[a]) ))
res.PCA = PCA(data[ , a ] , scale.unit=TRUE, ncp=3, graph=F) 
## Warning in PCA(data[, a], scale.unit = TRUE, ncp = 3, graph = F): Missing
## values are imputed by the mean of the variable: you should use the
## imputePCA function of the missMDA package
plot.PCA(res.PCA, axes=c(1, 2), choix="var", cex=0.7, title="", col.var = my_colors[b] )

Conclusion

  • FLO et EPI sont très corrélés. On peut en garder un des 2 seulement.
  • Plantes plus haute = plante moins malade. Grosse corrélation négative. –> cafacteur pour les QTLs? Ou dans le calcul des blups?
  • à l’échelle de l’expé, les variable se groupe par date d’observation, pas par méthode de phénotypage! Donc méthode de phéno = hautement correlées. Date d’observation = importante.
  • Grande variance inter expé. Que ce soit variable par variable ou en prenant toutes les variables, on a systématiquement un regroupement par expé très net qui apparait.

5/ Evolution de la maladie

Observation de l’évolution de PEPI pour 50 individus de Montbartier 2014. Il y a 4 dates de notation dispo.

# get dataa of PEPI in bar14
a=intersect(BAR14,PEPI)
a=intersect(a,REP1)
a=setdiff(a,AUDPC)
don=data[,a]
# plot it
par(mar=c(2,3,2,2))
plot(as.numeric(don[1,])~seq(1,4), type="o", pch=20, ylim=c(0,100), xaxt="n", ylab="PEPI (%)", xlab="", bty="n", col=my_colors[5])
mtext(at=c(1:4) ,gsub("BAR14_","",gsub("_rep1","",colnames(don))) , side=1, col="grey")
abline(v=c(1:4), col="grey")
for(i in c(2:50)){
  points(as.numeric(don[i,])~seq(1,4) , type="o", pch=20, col=my_colors[5])
}

Yan Holtz

October 2016